Optimal Asset Allocation for Indian Stock Portfolios: A Risk and Return Analysis¶
Submitted by: Rohan Sam Kuruvilla¶
Student ID: 24006497¶
Introduction¶
The aim of this project is to develop a comprehensive understanding of the factors influencing my investment portfolio's performance and to identify strategies for optimizing returns while managing risk. By leveraging diverse data sources and employing advanced data analysis techniques, the focus is to uncover insights that can guide investment decisions.
The analysis is guided by several key research questions, focusing on the impact of various financial indices, commodities, and exchange rates on portfolio values. Additionally, the volatility of different asset classes, including cryptocurrencies, traditional stocks, gold, and crude oil, will be explored to determine their suitability in a diversified investment portfolio.
By integrating data from multiple sources, including dynamic data from web APIs, and employing advanced data processing techniques, the project presents a thorough and innovative analysis. The results will enhance the understanding of portfolio dynamics and demonstrate the practical application of data analysis and database management skills in real-world financial scenarios.
Research Questions¶
How do fluctuations in various financial indices and commodities influence individual investment portfolio values over time?
What is the impact of the INR/USD exchange rate on the value of investment portfolios in India, considering immediate and lagged effects?
How does the volatility of cryptocurrencies (Bitcoin and Ethereum) compare to traditional assets (stocks, gold, and crude oil) in a diversified investment portfolio, and what is the impact of including these volatile assets on the overall portfolio risk?
What is the optimal asset allocation strategy for maximizing returns while minimizing risk in an Indian stock portfolio?
Out of core Processing: Develop a Trend Indicator for the broader market using the 52-week high of NIFTY 50.
Data Sources¶
The data for this project has been sourced from:
- Portfolio ledger excel downloaded from the stock broker Zerodha.
- Historical Stock Data from Yahoo Finance Library.
- Nifty 50 & Nifty 100 CSV files downloaded from the National Stock Exchange.
- Daily Historical Values for S&P 500, Gold, Crude Oil, Bitcoin, and Ethereum using Twelve Data API.
- Daily Historical Price of Brent Crude from Alpha Vantage using API.
- Live USD - INR Foreign Exchange Rate using Beautiful Soup.
- NSE - Nifty 50 Index Minute data (2015 to 2024) csv file.
By addressing these research questions, the project aims to provide a comprehensive understanding of the factors influencing portfolio performance and to develop strategies for optimal asset allocation and rebalancing. This analysis will help in making informed investment decisions, ensuring that the portfolio is well-positioned to achieve its financial goals while managing risk effectively.
Executive Summary¶
This project is focused on optimizing my personal investment portfolio, consisting primarily of Indian stocks, by exploring the impact of various financial variables and asset classes on portfolio performance. The primary objective was to identify an optimized, diversified portfolio that offers the best return versus risk. Using Ordinary Least Squares (OLS) regression, I analyzed the influence of market indices, cryptocurrencies, commodities like gold and Brent Crude, and the INR/USD exchange rate on the portfolio value. The regression model explained 98.5% of the variability in portfolio value, highlighting the strong impact of these factors on investment outcomes.
The analysis revealed significant positive influences from the NIFTY50 index and Bitcoin, while Ethereum, gold, and Brent Crude had negative impacts. The INR/USD exchange rate significantly increased portfolio value, indicating sensitivity to currency fluctuations. Understanding both immediate and lagged effects of exchange rate changes was crucial, as they highlighted the importance of monitoring and responding to these movements for better investment management.
Through the construction of an efficient frontier, the study visualized the risk-return trade-off for various portfolio combinations. The optimal portfolio, with the highest Sharpe Ratio, predominantly allocated 70% to Indian stocks, 9% to Bitcoin, and 21% to gold. Other assets, including NIFTY50, NIFTY100, Ethereum, S&P 500, INR/USD, and Brent Crude, had negligible weights, indicating their limited contribution to the optimal risk-return trade-off.
Based on these findings, I recommend a diversified investment strategy prioritizing Indian stocks and including allocations in gold and Bitcoin. This strategy aims to maximize returns while minimizing risk, suitable for investors with varying risk tolerances and return objectives. The insights from this analysis can help investors make informed decisions and enhance their portfolio management practices, emphasizing the importance of continual reassessment and adaptation to changing market conditions.
## Import the necessary python libraries
import json
import requests
import squarify
import numpy as np
import pandas as pd
import yfinance as yf
import seaborn as sns
import scipy.optimize as sco
import statsmodels.api as sm
from bs4 import BeautifulSoup
from datetime import datetime
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from matplotlib.colors import LinearSegmentedColormap
## Setting pandas options for formatting the results
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 40)
pd.set_option('display.max_rows', 50)
pd.set_option('display.width', 100)
Data Acquisition¶
DataSet 1: Portfolio Ledger from Broker¶
The primary dataset for this project is the portfolio ledger obtained from Zerodha, a well-known Indian stock broker. This dataset contains detailed information about the investments held in the portfolio, including various stocks ranging from large-cap to micro-cap.
## Read the Broker provided ledger into a dataframe
ledger_df = pd.read_excel('/Users/rohan/Downloads/holdings-IX0319_April.xlsx')
DataSet 2: Historical Stock Data from Yahoo Finance Library¶
The second dataset for this project comprises historical stock data retrieved using the Yahoo Finance library (yfinance). The stock data is taken only for those in the broker ledger. This library is widely used for accessing historical market data and expects stock symbols in a specific format. For non-US stocks, such as those listed on the Indian stock exchanges, an exchange suffix is appended to the stock symbol. For instance, .NS is used for NSE-listed stocks and .BO for BSE-listed stocks.
# Extracting the Historical Data for all Stocks in the Broker Ledger
symbols = ledger_df['Symbol'].tolist()
# Define a function to fetch historical data from Yahoo Finance
def fetch_historical_data(symbols, start_date='2022-01-01', end_date='2024-05-01'):
historical_data = {}
for symbol in symbols:
# Assume that the symbols are listed on the NSE
nse_symbol = f"{symbol}.NS"
# Fetch the historical data
#historical_data[symbol] = yf.download(nse_symbol, start=start_date, end=end_date)
data = yf.download(nse_symbol, start=start_date, end=end_date)
historical_data[symbol] = data
# Check for missing values in the data
if data.isnull().any().any(): # This checks if there's any missing value in any column
print(f"Missing values detected in {symbol}")
return historical_data
# Fetch the historical data for the symbols
historical_stock_data = fetch_historical_data(symbols)
[*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed
# Accessing the historical for a the 2nd stock symbol in dictionary and display its head
stock_symbol = list(historical_stock_data.keys())[1]
print(f"Displaying data for {stock_symbol}:")
historical_stock_data[stock_symbol].head()
Displaying data for ACE:
| Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2022-01-03 | 223.250000 | 225.500000 | 222.449997 | 223.600006 | 222.879272 | 169975 |
| 2022-01-04 | 224.199997 | 226.899994 | 222.199997 | 225.000000 | 224.274750 | 380358 |
| 2022-01-05 | 223.000000 | 225.100006 | 219.149994 | 220.149994 | 219.440384 | 271958 |
| 2022-01-06 | 220.000000 | 224.500000 | 218.000000 | 223.600006 | 222.879272 | 312688 |
| 2022-01-07 | 223.899994 | 234.850006 | 223.899994 | 230.750000 | 230.006210 | 689786 |
¶
DataSet 3: Nifty 50 & Nifty 100 CSV files downloaded from National Stock Exchange¶
The third dataset consists of historical data for the Nifty 50 and Nifty 100 indices, downloaded from the National Stock Exchange (NSE) of India (nseindia.com). These indices are key benchmarks for the Indian equity market, representing the performance of the largest and most liquid Indian companies.
- NIFTY 50: This index is a basket of the top 50 stocks by market capitalization, providing a comprehensive view of the performance of the largest companies in India.
- NIFTY 100: This index is a basket of the top 100 stocks by market capitalization, offering a broader perspective on the Indian stock market by including more companies.
# Path to the CSV files
nifty50_path = '/Users/rohan/Downloads/NIFTY 50_Historical_PR_01012022to01052024.csv'
nifty100_path = '/Users/rohan/Downloads/NIFTY 100_Historical_PR_01012022to01052024.csv'
# Define the date format
date_format = '%d-%b-%y' # Format in csv is "30-Apr-24" format
# Read the NIFTY 50 & 100 data with specified date format
nifty50_data = pd.read_csv(nifty50_path, parse_dates=['Date'], date_format=date_format, index_col='Date')
nifty100_data = pd.read_csv(nifty100_path, parse_dates=['Date'], date_format=date_format, index_col='Date')
print(nifty100_data)
print(nifty50_data)
Index Name Open High Low Close
Date
2024-04-30 NIFTY 100 23499.75 23618.70 23402.75 23436.50
2024-04-29 NIFTY 100 23305.70 23457.90 23250.90 23443.80
2024-04-26 NIFTY 100 23389.20 23392.80 23200.70 23232.10
2024-04-25 NIFTY 100 23099.45 23372.90 23092.35 23329.80
2024-04-24 NIFTY 100 23177.00 23230.65 23148.30 23167.80
... ... ... ... ... ...
2022-01-07 NIFTY 100 18035.70 18140.85 17944.40 18053.95
2022-01-06 NIFTY 100 17997.60 18025.25 17889.75 17985.60
2022-01-05 NIFTY 100 18044.30 18164.70 17974.90 18147.50
2022-01-04 NIFTY 100 17934.50 18051.50 17831.50 18030.95
2022-01-03 NIFTY 100 17654.45 17894.15 17651.45 17874.00
[576 rows x 5 columns]
Index Name Open High Low Close
Date
2024-04-30 NIFTY 50 22679.65 22783.35 22568.40 22604.85
2024-04-29 NIFTY 50 22475.55 22655.80 22441.90 22643.40
2024-04-26 NIFTY 50 22620.40 22620.40 22385.55 22419.95
2024-04-25 NIFTY 50 22316.90 22625.95 22305.25 22570.35
2024-04-24 NIFTY 50 22421.55 22476.45 22384.00 22402.40
... ... ... ... ... ...
2022-01-07 NIFTY 50 17797.60 17905.00 17704.55 17812.70
2022-01-06 NIFTY 50 17768.50 17797.95 17655.55 17745.90
2022-01-05 NIFTY 50 17820.10 17944.70 17748.85 17925.25
2022-01-04 NIFTY 50 17681.40 17827.60 17593.55 17805.25
2022-01-03 NIFTY 50 17387.15 17646.65 17383.30 17625.70
[576 rows x 5 columns]
¶
DataSet 4: Daily Historical Values for S&P 500, Gold, Crude Oil, Bitcoin, and Ethereum using Twelve Data API¶
The fourth dataset involves daily historical values for several key financial instruments and commodities obtained using the Twelve Data API. This data includes the daily prices of S&P 500 index, gold, crude oil, Bitcoin, and Ethereum. These assets represent a diverse mix of traditional and digital assets, offering a broad view of market dynamics.
Key Financial Instruments and Commodities:¶
S&P 500: The S&P 500 is a stock market index that tracks the performance of 500 of the largest companies listed on stock exchanges in the United States. It is widely regarded as one of the best indicators of the overall performance of the U.S. stock market.
Gold: Gold is a precious metal that has been used as a store of value for centuries. It is considered a safe-haven asset, often sought after during times of market volatility and economic uncertainty.
Crude Oil: Crude oil is a critical commodity that serves as the primary raw material for the production of petroleum products. Its price is influenced by global supply and demand dynamics, geopolitical events, and economic conditions.
Bitcoin: Bitcoin is a digital cryptocurrency and a decentralized digital currency without a central bank or single administrator. It can be sent from user to user on the peer-to-peer Bitcoin network without the need for intermediaries.
Ethereum: Ethereum is an open-source, blockchain-based platform that enables developers to build and deploy decentralized applications. Its native cryptocurrency, Ether (ETH), is the second-largest cryptocurrency by market capitalization after Bitcoin.
def fetch_daily_prices(symbol, api_key, start_date='2022-01-01'):
"""Fetch daily historical prices for a specified symbol from Twelve Data."""
end_date = '2024-05-01'
url = 'https://api.twelvedata.com/time_series'
params = {
'symbol': symbol,
'interval': '1day',
'start_date': start_date,
'end_date': end_date,
'apikey': api_key,
'outputsize': '5000',
'format': 'JSON'
}
response = requests.get(url, params=params)
data = response.json()
if 'values' in data:
df = pd.DataFrame(data['values'])
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime', inplace=True)
df['close'] = pd.to_numeric(df['close'])
return df
else:
error_message = data.get('message', 'No Data.')
print(f"Failed to fetch data for {symbol}: {error_message}")
return pd.DataFrame() # Return an empty DataFrame in case of an error
# API Key for Twelve Data
api_key = '0a042250d2cc4d9b8c8606f28f04d07d'
# Fetch data for Bitcoin, Ethereum, S&P 500, Gold, Crude Oil, and USD/INR exchange rate
btc_data = fetch_daily_prices('BTC/USD', api_key)
eth_data = fetch_daily_prices('ETH/USD', api_key)
sp500_data = fetch_daily_prices('SPX', api_key)
gold_data = fetch_daily_prices('XAU/USD', api_key)
usdinr_data = fetch_daily_prices('INR/USD', api_key)
# Displaying the data to verify
print("Bitcoin Daily Prices Data:\n", btc_data.head(1))
print("Ethereum Daily Prices Data:\n", eth_data.head(1))
print("S&P 500 Daily Prices Data:\n", sp500_data.head(1))
print("Gold Daily Prices Data:\n", gold_data.head(1))
print("USD/INR Exchange Rate Data:\n", usdinr_data.head(1))
Bitcoin Daily Prices Data:
open high low close
datetime
2024-05-01 60609.49609 60780.50000 56555.29297 58254.01172
Ethereum Daily Prices Data:
open high low close
datetime
2024-05-01 3013.51000 3023.10000 2816.99000 2971.63
S&P 500 Daily Prices Data:
open high low close volume
datetime
2024-04-30 5103.77979 5110.83008 5035.31006 5035.68994 4082470000
Gold Daily Prices Data:
open high low close
datetime
2024-04-30 2335.10498 2336.54492 2284.57495 2284.57495
USD/INR Exchange Rate Data:
open high low close
datetime
2024-04-30 0.01198 0.01199 0.01197 0.01198
¶
Dataset 5: Daily Historical Price of Brent Crude from Alpha Vantage using API¶
The fifth dataset comprises the daily historical prices of Brent Crude oil, obtained using the Alpha Vantage API. Brent Crude oil is a major benchmark for oil prices worldwide, and its historical price data is essential for understanding its impact on investment portfolios and market dynamics.
Brent Crude oil serves as a critical global benchmark for oil prices. It plays a significant role in the global energy market and is influenced by various factors such as geopolitical events, supply and demand dynamics, and economic conditions.
# Define the API key, symbol, and API endpoint
api_key = '7P1W36NJ7A3TDMJT'
symbol = 'WTI'
url = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={symbol}&apikey={api_key}&outputsize=full'
# Send request and load data as JSON
data = requests.get(url).json()
# Process data if retrieval was successful
if 'Time Series (Daily)' in data:
# Convert to DataFrame
df = pd.DataFrame(data['Time Series (Daily)']).T #Transposing rearranges the DataFrame so that each date becomes a row index and open, high, low, close become columns
df.index = pd.to_datetime(df.index)
df = df.sort_index()
# Filter dates to include data from January 1, 2022, to April 30, 2024
crude_oil_data = df.loc['2022-01-01':'2024-05-01']
print(crude_oil_data.head(10000))
else:
# Handle errors or data retrieval issues
print("Error:", data.get("Note", data.get("Error Message", "Unknown error")))
1. open 2. high 3. low 4. close 5. volume 2022-01-03 3.2600 3.5361 3.2350 3.5100 3202137 2022-01-04 3.5700 3.6650 3.5500 3.6100 1609299 2022-01-05 3.6700 3.7500 3.4800 3.4800 1822622 2022-01-06 3.6200 3.7200 3.5200 3.6800 2023118 2022-01-07 3.7000 3.7100 3.5900 3.6300 1070560 ... ... ... ... ... ... 2024-04-25 2.3500 2.3850 2.3000 2.3300 2452300 2024-04-26 2.3200 2.3700 2.3100 2.3500 928544 2024-04-29 2.3500 2.4000 2.3300 2.3800 1351925 2024-04-30 2.3800 2.3800 2.2500 2.2500 1864153 2024-05-01 2.2600 2.2699 2.1900 2.2300 2088563 [585 rows x 5 columns]
¶
Dataset 6: Live USD - INR Foreign Exchange Rate using Beautiful Soup¶
The sixth dataset involves extracting the live USD/INR foreign exchange rate from www.xe.com using web scraping with Beautiful Soup. This exchange rate is crucial for converting the portfolio value from INR to USD, and calculating the as on date value of portfolio in USD.
# Fetching and parsing the exchange rate from XE
response = requests.get( 'https://www.xe.com/currencyconverter/convert/?Amount=1&From=USD&To=INR',
headers={'User-Agent': 'Mozilla/5.0'} )
print (response)
soup = BeautifulSoup(response.text, 'html.parser')
rate_element = soup.find('p', class_='sc-1c293993-1 fxoXHw')
# Extracting and converting the exchange rate
USD_INR_rate = float(rate_element.text.split()[0])
print("USD to INR Exchange Rate:", USD_INR_rate)
<Response [200]> USD to INR Exchange Rate: 83.309741
DATA WRANGLING¶
Transforming Variables¶
Calculation of Current value & Quantity of each stocks in the protfolio¶
In the first step the total quantity of each stock by summing the quantities available, pledged as margin, and pledged as loans. This total is not directly given in the ledger. Then calculated the total investment in each stock and current market values based on stock quantities and their respective prices, providing the initial capital allocation and current worth. Here I also computed the portfolio's weighted average, showing each stock's proportionate impact on the total investment.
The total quantity is calculated by adding the quantity available (unpledged), quantity pledged for margin (as I pledge stocks for generating margins for derivatives trading in Futures & Options), and quantity pledged for loan.
ledger_df.head(4)
| Symbol | ISIN | Sector | Quantity Available | Quantity Discrepant | Quantity Long Term | Quantity Pledged (Margin) | Quantity Pledged (Loan) | Average Price | Previous Closing Price | Unrealized P&L | Unrealized P&L Pct. | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AAVAS | INE216P01012 | Financials | 4 | 0 | 0 | 4 | 0 | 1534.1625 | 1619.70 | 684.30 | 5.5755 |
| 1 | ACE | INE731H01025 | Industrials | 1 | 0 | 1 | 0 | 0 | 81.7500 | 1537.20 | 1455.45 | 1780.3670 |
| 2 | ADANIENT | INE423A01024 | Industrials | 1 | 0 | 2 | 4 | 0 | 2998.9400 | 3081.20 | 411.30 | 2.7430 |
| 3 | ADANIPORTS | INE742F01042 | Industrials | 7 | 0 | 2 | 12 | 0 | 872.1737 | 1317.25 | 8456.45 | 51.0307 |
def calculate_financials(ledger_df):
# Calculate total quantity held for each stock
ledger_df['Quantity Held'] = ledger_df['Quantity Available'] + ledger_df['Quantity Pledged (Margin)'] + ledger_df['Quantity Pledged (Loan)']
# Calculate total investment and current value
ledger_df['Total Investment'] = ledger_df['Quantity Held'] * ledger_df['Average Price']
ledger_df['Current Value'] = ledger_df['Quantity Held'] * ledger_df['Previous Closing Price']
# Calculate the weighted average of each stock as a percentage of the total investment
ledger_df['Weighted Average'] = ledger_df['Total Investment'] / (ledger_df['Total Investment'].sum()) * 100
# Rename 'Average Price' to 'Average Buy Price'
ledger_df.rename(columns={'Average Price': 'Average Buy Price'}, inplace=True)
# Select only the desired columns to retain in the final DataFrame
final_columns = ['Symbol', 'Sector', 'Quantity Held', 'Average Buy Price', 'Previous Closing Price', 'Total Investment', 'Current Value', 'Weighted Average']
final_df = ledger_df[final_columns]
return final_df
cleaned_ledger_df=calculate_financials(ledger_df)
cleaned_ledger_df
| Symbol | Sector | Quantity Held | Average Buy Price | Previous Closing Price | Total Investment | Current Value | Weighted Average | |
|---|---|---|---|---|---|---|---|---|
| 0 | AAVAS | Financials | 8 | 1534.1625 | 1619.70 | 12273.3000 | 12957.60 | 0.975630 |
| 1 | ACE | Industrials | 1 | 81.7500 | 1537.20 | 81.7500 | 1537.20 | 0.006498 |
| 2 | ADANIENT | Industrials | 5 | 2998.9400 | 3081.20 | 14994.7000 | 15406.00 | 1.191959 |
| 3 | ADANIPORTS | Industrials | 19 | 872.1737 | 1317.25 | 16571.3003 | 25027.75 | 1.317286 |
| 4 | AFFLE | Communication Services | 3 | 1115.1833 | 1105.75 | 3345.5499 | 3317.25 | 0.265945 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 67 | TATASTEEL | Materials | 185 | 119.9771 | 167.40 | 22195.7635 | 30969.00 | 1.764386 |
| 68 | TCS | Information Technology | 13 | 3160.7269 | 3870.60 | 41089.4497 | 50317.80 | 3.266284 |
| 69 | TVSMOTOR | Consumer Discretionary | 7 | 927.7643 | 2036.60 | 6494.3501 | 14256.20 | 0.516249 |
| 70 | WIPRO | Information Technology | 22 | 450.8932 | 462.95 | 9919.6504 | 10184.90 | 0.788533 |
| 71 | ZOMATO | Communication Services | 57 | 82.8000 | 193.60 | 4719.6000 | 11035.20 | 0.375171 |
72 rows × 8 columns
Calculating Historical Portfolio Values in INR & USD¶
To conduct a comprehensive analysis, it's important to have all values in a single currency. In this case, converting the historical portfolio values from INR to USD will help simplify the analysis and allow for consistent comparisons. So here we finding out the currentl portfolio value in USD and INR using the live INR/USD rate from www.xe.com using web scraping with Beautiful Soup
def calculate_daily_portfolio_values(historical_data, ledger_df):
portfolio_values = pd.DataFrame(index=historical_data[list(historical_data.keys())[0]].index)
portfolio_values['Portfolio Value'] = 0
for symbol, data in historical_data.items():
quantity = ledger_df[ledger_df['Symbol'] == symbol]['Quantity Held'].iloc[0]
portfolio_values['Portfolio Value'] += data['Close'] * quantity
return portfolio_values
# Calculate daily portfolio values
portfolio_values = calculate_daily_portfolio_values(historical_stock_data, cleaned_ledger_df).round(2)
portfolio_values_adjusted = (calculate_daily_portfolio_values(historical_stock_data, cleaned_ledger_df)/ USD_INR_rate).round(2)
# Print the latest portfolio values
latest_portfolio_value = portfolio_values.iloc[-1]['Portfolio Value']
print("Current portfolio value in INR:", latest_portfolio_value)
latest_portfolio_value_USD = portfolio_values_adjusted.iloc[-1]['Portfolio Value']
print("Current portfolio value in USD:", latest_portfolio_value_USD)
Current portfolio value in INR: 1856519.3 Current portfolio value in USD: 22284.54
Concatinating the Data after Reindexing on Date¶
In the analysis, the focus is only on the daily closing prices for NIFTY50,NIFTY100, Bitcoin, Ethereum, S&P 500, Brent Crude, USD INR value and Gold. All other data columns are disregarded to maintain a clear focus on closing values, which are pivotal for the study's assessment of market trends and price movements.
To ensure a common index, by taking the union of indices al the DataFrames (nifty50, nifty100, btc, eth, Gold, Crude, S&P500, Portfolio Values). This step will ensure that the index includes all unique date from all the different datasets. Absent fields would be NaN. Then each DataFrame is reindexed to a common index. This aligns all DataFrames to the same set of dates, introducing NaN values for any date not which present in a given DataFrame. After reindexing, all the DataFrames are concatenated to one dataframe, which aligns the data from each source by the common dates, allowing for direct comparison across different metrics.
# Extracting closing prices from each dataset
nifty50_close = nifty50_data[['Close']].rename(columns={'Close': 'NIFTY50'})
nifty100_close = nifty100_data[['Close']].rename(columns={'Close': 'NIFTY100'})
btc_close = btc_data[['close']].rename(columns={'close': 'BITCOIN'})
eth_close = eth_data[['close']].rename(columns={'close': 'ETHEREUM'})
sp500_close = sp500_data[['close']].rename(columns={'close': 'S&P 500'})
gold_close = gold_data[['close']].rename(columns={'close': 'GOLD'})
usdinr_close = usdinr_data[['close']].rename(columns={'close': 'INR/USD'})
crude_oil_close = crude_oil_data[['4. close']].rename(columns={'4. close': 'Brent Crude'})
# Check and align indices by forming a common index from all DataFrames
common_index = pd.Index.union(nifty50_close.index, nifty100_close.index)
common_index = common_index.union(btc_close.index)
common_index = common_index.union(eth_close.index)
common_index = common_index.union(sp500_close.index)
common_index = common_index.union(gold_close.index)
common_index = common_index.union(crude_oil_close.index)
common_index = common_index.union(usdinr_close.index)
common_index = common_index.union(portfolio_values_adjusted.index)
# Reindex all DataFrames to this common index
nifty50_close = nifty50_close.reindex(common_index)
nifty100_close = nifty100_close.reindex(common_index)
btc_close = btc_close.reindex(common_index)
eth_close = eth_close.reindex(common_index)
sp500_close = sp500_close.reindex(common_index)
gold_close = gold_close.reindex(common_index)
usdinr_close = usdinr_close.reindex(common_index)
crude_oil_close = crude_oil_close.reindex(common_index)
portfolio_values_adjusted = portfolio_values_adjusted.reindex(common_index)
# Now concatenate them
combined_data = pd.concat([portfolio_values_adjusted, nifty50_close, nifty100_close, btc_close, eth_close, sp500_close, gold_close, usdinr_close, crude_oil_close], axis=1)
# Display the combined data
combined_data.head()
| Portfolio Value | NIFTY50 | NIFTY100 | BITCOIN | ETHEREUM | S&P 500 | GOLD | INR/USD | Brent Crude | |
|---|---|---|---|---|---|---|---|---|---|
| 2022-01-01 | NaN | NaN | NaN | 47686.81250 | 3765.62012 | NaN | NaN | NaN | NaN |
| 2022-01-02 | NaN | NaN | NaN | 47345.21875 | 3829.55005 | NaN | NaN | 0.01343 | NaN |
| 2022-01-03 | 15500.81 | 17625.70 | 17874.00 | 46458.11719 | 3763.79004 | 4796.56006 | 1800.84875 | 0.01344 | 3.5100 |
| 2022-01-04 | 15631.55 | 17805.25 | 18030.95 | 45897.57422 | 3785.87012 | 4793.54004 | 1814.56006 | 0.01342 | 3.6100 |
| 2022-01-05 | 15764.48 | 17925.25 | 18147.50 | 43569.00391 | 3537.90991 | 4700.58008 | 1809.53418 | 0.01344 | 3.4800 |
Handling Missing Values¶
Missing data may occur for instruments traded on stock exchanges due to market closures on public or national holidays. For cryptocurrencies, however, missing data should generally be absent because they are traded 24/7 and do not close for holidays. For any missing values we will use the forward fill as it propagates the last non-null value ahead until another missing value is encountered. This approach is suitable for stock market data as it is safe to assume that the last observed price is valid until a new price is quoted. Backward fill is also used in this dataset for the first two days of 2022 as Markets opened only on Jan 3, 2022 after new year and weekend. Backward fill is done after forward filling the missing values.
The individual stock data is checked for missing values during the import step from the Yahoo Finance library, and we confirmed there were no missing values. However, when we combine the portfolio data with international market data, we may encounter missing values due to the different trading days/holidays between Indian exchanges and global markets.
# Identifying and counting missing fields in each column
missing_data_count = combined_data.isna().sum()
print("Missing Data Count in Each Column:")
print(missing_data_count)
Missing Data Count in Each Column: Portfolio Value 280 NIFTY50 276 NIFTY100 276 BITCOIN 0 ETHEREUM 0 S&P 500 268 GOLD 247 INR/USD 188 Brent Crude 267 dtype: int64
# Forward filling missing data in the specified columns
combined_data['Portfolio Value'].ffill(inplace=True)
combined_data['NIFTY50'].ffill(inplace=True)
combined_data['NIFTY100'].ffill(inplace=True)
combined_data['S&P 500'].ffill(inplace=True)
combined_data['GOLD'].ffill(inplace=True)
combined_data['INR/USD'].ffill(inplace=True)
combined_data['Brent Crude'].ffill(inplace=True)
# Backward filling for the first days of 2022
combined_data['Portfolio Value'].bfill(inplace=True)
combined_data['NIFTY50'].bfill(inplace=True)
combined_data['NIFTY100'].bfill(inplace=True)
combined_data['S&P 500'].bfill(inplace=True)
combined_data['GOLD'].bfill(inplace=True)
combined_data['INR/USD'].bfill(inplace=True)
combined_data['Brent Crude'].bfill(inplace=True)
# Displaying missing data count after imputation to verify
missing_data_count_after = combined_data.isna().sum()
print("Missing Data Count in Each Column After Imputation:\n", missing_data_count_after)
# Convert all columns to numeric, coercing errors will turn non-convertible values into NaN
combined_data = combined_data.apply(pd.to_numeric, errors='coerce')
combined_data.head(5)
Missing Data Count in Each Column After Imputation: Portfolio Value 0 NIFTY50 0 NIFTY100 0 BITCOIN 0 ETHEREUM 0 S&P 500 0 GOLD 0 INR/USD 0 Brent Crude 0 dtype: int64
| Portfolio Value | NIFTY50 | NIFTY100 | BITCOIN | ETHEREUM | S&P 500 | GOLD | INR/USD | Brent Crude | |
|---|---|---|---|---|---|---|---|---|---|
| 2022-01-01 | 15500.81 | 17625.70 | 17874.00 | 47686.81250 | 3765.62012 | 4796.56006 | 1800.84875 | 0.01343 | 3.51 |
| 2022-01-02 | 15500.81 | 17625.70 | 17874.00 | 47345.21875 | 3829.55005 | 4796.56006 | 1800.84875 | 0.01343 | 3.51 |
| 2022-01-03 | 15500.81 | 17625.70 | 17874.00 | 46458.11719 | 3763.79004 | 4796.56006 | 1800.84875 | 0.01344 | 3.51 |
| 2022-01-04 | 15631.55 | 17805.25 | 18030.95 | 45897.57422 | 3785.87012 | 4793.54004 | 1814.56006 | 0.01342 | 3.61 |
| 2022-01-05 | 15764.48 | 17925.25 | 18147.50 | 43569.00391 | 3537.90991 | 4700.58008 | 1809.53418 | 0.01344 | 3.48 |
# Setting the color palette paste
sns.set_palette("pastel")
# List of variables to plot
variables = combined_data.columns
# Number of variables
num_vars = len(variables)
# Create subplots: 2 columns for each variable (histogram and box plot)
fig, axes = plt.subplots(nrows=num_vars, ncols=2, figsize=(15, 5 * num_vars))
# Loop through each variable
for i, var in enumerate(variables):
# Histogram
sns.histplot(combined_data[var].dropna(), ax=axes[i, 0], kde=True, color="orange")
axes[i, 0].set_title(f'Histogram of {var}', fontsize=12)
# Box plot
sns.boxplot(y=combined_data[var].dropna(), ax=axes[i, 1], color="lightcoral")
axes[i, 1].set_title(f'Box Plot of {var}', fontsize=12)
# Adjust layout
plt.tight_layout()
plt.show()
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
/opt/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
with pd.option_context('mode.use_inf_as_na', True):
(not splitting the plot on purpose for seperate variables, enabled scrolling for ease of view)
Findings from Distribution of variables¶
- Stock Indices (NIFTY50, NIFTY100, S&P 500): The right skewness and presence of outliers in stock indices can be attributed to periods of rapid market growth and occasional sharp declines due to economic downturns or market corrections.
- Cryptocurrencies (Bitcoin, Ethereum): The significant right skewness and high number of outliers reflect the speculative nature and rapid technological advancements in the cryptocurrency market, leading to large price fluctuations.
- Brent Crude): The volatility in Brent Crude prices is driven by geopolitical events, changes in production levels by OPEC and non-OPEC countries, and global economic conditions.
- Gold: Gold’s role as a safe-haven asset leads to more stable prices with fewer outliers. Its value is often driven by investor demand during economic uncertainties and inflationary periods.
- INR/USD Exchange Rate: The relative stability in the INR/USD exchange rate can be due to the interventions by the Reserve Bank of India (RBI) to manage excessive volatility and maintain economic stability.
These observations provide a comprehensive understanding of the behavior and characteristics of different financial variables in the dataset.
Portfolio Value Visualization With Exponential Moving Average using an Interactive Plot¶
The interactive graph highlights the performance of the equity portfolio from January 2022 to May 2024. It visualizes the portfolio value over time using an interactive line plot. Additionally, we will calculate and display the 30-day Exponential Moving Average (EMA) to help identify trends in the portfolio's performance. The interactive plot will allow us to explore the data in more detail, providing insights into the fluctuations and overall trend of the portfolio value.
What is a 30-Day Exponential Moving Average (EMA)?¶
A 30-day Exponential Moving Average (EMA) is a type of moving average that places a greater weight and significance on the most recent data points. It is used to smooth out short-term fluctuations and highlight longer-term trends in time series data. The 30-day EMA calculates the average of data points over the past 30 days, giving more weight to recent observations, which makes it more responsive to new information compared to a simple moving average (SMA).
In this code the .ewm(span=30, adjust=False).mean() function calculates the 30-day EMA for the 'Portfolio Value' column. This is a vectorized operation, meaning it computes the EMA for the entire column at once rather than iterating through each row.
# Function to plot interactive portfolio value with 30-day EMA
def plot_interactive_portfolio(data):
# Calculate the 30-day EMA
ema_30 = data['Portfolio Value'].ewm(span=30, adjust=False).mean()
# Create an interactive line plot of portfolio values
fig = go.Figure()
fig.add_trace(go.Scatter(x=data.index, y=data['Portfolio Value'], mode='lines', name='Portfolio Value'))
fig.add_trace(go.Scatter(x=data.index, y=ema_30, mode='lines', name='30-Day EMA', line=dict(dash='dash')))
# Update layout for better readability and increased height
fig.update_layout(
title='Interactive Portfolio Value',
xaxis_title='Date',
yaxis_title='Value (USD)',
hovermode='x', # Crosshair along the x-axis
height=520,
width=1100
)
# Show the figure
fig.show()
# Call the function to plot
plot_interactive_portfolio(combined_data)
Interpretation of the portfolio value trend:¶
Initial Period (Jan 2022 - Jul 2022):
- The portfolio value starts at approximately $16,000 and there is a downward trend reaching a low of 12k around June 2022
- The early part of 2022 saw increased geopolitical tensions, particularly with the conflict between Russia and Ukraine starting in February 2022. This conflict led to market instability and global economic uncertainty.
Mid Period (Jul 2022 - Jan 2023):
- The portfolio experienced a recovery and a steady upward trend reaching around 15,000 August 2022
- August 2022 to January was period of consolidation at $15,000 levels.
- The 30-day EMA rises gradually, indicating consistent gains over this period.
Fluctuating Growth (Jan 2023 - Jan 2024):
- The portfolio value shows significant growth with some volatility, peaking and then correcting multiple times.
- Despite fluctuations, general trend remained upward, with the portfolio value reaching around $21,000 by January 2024.
- The 30-day EMA shows these fluctuations but maintains an overall upward trend, indicating that the portfolio is generally growing despite the short-term ups and downs.
Recent Period (Jan 2024 - Apr 2024):
- The portfolio value continues to rise, showing a significant increase to approximately $22,000.
- The 30-day EMA follows this upward trend, confirming the consistent positive performance of the portfolio in the most recent months.
- The recent strong upward trend suggests a positive outlook for the portfolio in the near future.
¶
Sector Wise Allocation of Portfolio¶
Sector-wise allocation of a portfolio involves distributing investments across different industry sectors such as technology, healthcare, finance, consumer goods, energy, and more. This approach helps diversify the portfolio and manage risk by spreading investments across various areas of the economy. By analyzing the sector-wise allocation, investors can understand how their portfolio is distributed among various sectors and identify any overexposure to particular sectors.
def aggregate_by_sector(ledger_df, usd_inr_rate):
# Group data by 'Sector' and perform calculations
sector_data = ledger_df.groupby('Sector').agg(
Number_of_Companies=('Symbol', 'count'), # Count of companies per sector
Total_Investment_INR=('Total Investment', 'sum'), # Total investment per sector
Current_Value_INR=('Current Value', 'sum'), # Sum of current values per sector in INR
)
sector_data['Current_Value_USD'] = sector_data['Current_Value_INR'] / usd_inr_rate
# Calculate weighted average & CAGR of each sector as a percentage of the total current value
sector_data['Weighted_Average'] = sector_data['Current_Value_USD'] / sector_data['Current_Value_USD'].sum() * 100
sector_data['CAGR'] = ((sector_data['Current_Value_INR'] / sector_data['Total_Investment_INR']) ** (1 / 2.33) - 1) * 100 # 2.33 years from 2022 to April 30, 2024
sector_data = sector_data.sort_values(by='Weighted_Average', ascending=False)# Sort the DataFrame by the 'Weighted_Average' column in descending order
return sector_data
sector_based_df = aggregate_by_sector(cleaned_ledger_df, USD_INR_rate)
# Apply formatting for display
formatted_sector_df = sector_based_df.style.format({
'Total_Investment_INR': '{:,.2f}', 'Current_Value_INR': '{:,.2f}',
'Current_Value_USD': '{:,.2f}', 'Weighted_Average': '{:.2f}%', 'CAGR': '{:.2f}%'
})
formatted_sector_df
| Number_of_Companies | Total_Investment_INR | Current_Value_INR | Current_Value_USD | Weighted_Average | CAGR | |
|---|---|---|---|---|---|---|
| Sector | ||||||
| Energy | 4 | 142,346.55 | 411,227.60 | 4,936.13 | 22.49% | 57.67% |
| Financials | 17 | 308,626.20 | 372,186.85 | 4,467.51 | 20.36% | 8.37% |
| Materials | 14 | 259,054.29 | 302,431.75 | 3,630.21 | 16.54% | 6.87% |
| Consumer Staples | 6 | 208,899.40 | 219,307.10 | 2,632.43 | 12.00% | 2.11% |
| Information Technology | 5 | 147,779.55 | 168,877.70 | 2,027.11 | 9.24% | 5.89% |
| Consumer Discretionary | 6 | 61,422.60 | 123,690.25 | 1,484.70 | 6.77% | 35.04% |
| Industrials | 8 | 51,383.25 | 91,519.45 | 1,098.54 | 5.01% | 28.11% |
| Communication Services | 5 | 47,605.40 | 84,869.75 | 1,018.73 | 4.64% | 28.16% |
| ETF | 2 | 16,680.36 | 26,101.56 | 313.31 | 1.43% | 21.19% |
| Utilities | 2 | 6,645.15 | 18,927.00 | 227.19 | 1.04% | 56.71% |
| Health Care | 3 | 7,544.95 | 9,123.90 | 109.52 | 0.50% | 8.50% |
# Define the width of the bars
bar_width = 0.40
# Create an array of indices for the sectors
indices = np.arange(len(sector_based_df))
# Create the plot
fig, ax = plt.subplots(figsize=(11.5, 5.7))
# Plotting Weighted Average & CAGR
bars1 = ax.bar(indices - bar_width / 2, sector_based_df['Weighted_Average'], color='skyblue', width=bar_width, label='Weighted Average')
bars2 = ax.bar(indices + bar_width / 2, sector_based_df['CAGR'], color='orange', width=bar_width, label='CAGR')
# Adding labels and title
ax.set_xlabel('Sector')
ax.set_ylabel('Percentage (%)')
ax.set_title('Weighted Average and CAGR by Sector')
ax.set_xticks(indices)
ax.set_xticklabels(sector_based_df.index, rotation=60)
ax.legend()
# Adding annotations in grey color
for bar in bars1:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', color='grey')
for bar in bars2:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', color='grey')
# Removing y-axis markings
ax.yaxis.set_ticks([])
ax.yaxis.set_ticklabels([])
# Show plot
plt.tight_layout()
plt.show()
Summary of Sector-Wise Portfolio Allocation¶
- Highest Allocation: The Energy sector has the highest total current value in USD ($4,933.50) and the highest weighted average allocation (22.49%).
- Significant Growth: The Energy and Utilities sector have a very high CAGR, indicating strong performance over the period, however the allocation of utilities is only 1.04%.
- Diversified Portfolio: The portfolio is diversified across multiple sectors, with Financials and Materials also having significant allocations and steady growth.
- High Growth Sectors: Consumer Discretionary, Industrials, and Communication Services show high growth rates (CAGR) despite smaller allocations.
- Smaller Allocations: ETFs, Utilities, and Health Care have smaller allocations but some sectors like Utilities show impressive growth (CAGR of 56.71%).
Also it is noted that sectors such as Finance and Materials have 17 and 14 stocks respectively, which suggests a potential over-diversification.
Correlation Matrix¶
Conducting a correlation analysis is a fundamental step in the Exploratory Data Analysis (EDA) as it helps me identify the strength and direction of linear relationships between different variables in the dataset. Understanding these relationships is crucial for building effective predictive models and for making informed investment decisions. In this case, it is particularly important to determine how various financial instruments (e.g., stocks, cryptocurrencies, commodities, and exchange rates) are interrelated and how they collectively influence the overall portfolio value. Correlation matrix will be depicted using a heat map. 30 day exponential Moving Average has been taken out from the correlation calculation.
# Calculate the correlation matrix
correlation_matrix = combined_data.corr()
print("Correlation Matrix:")
correlation_matrix
Correlation Matrix:
| Portfolio Value | NIFTY50 | NIFTY100 | BITCOIN | ETHEREUM | S&P 500 | GOLD | INR/USD | Brent Crude | |
|---|---|---|---|---|---|---|---|---|---|
| Portfolio Value | 1.000000 | 0.982896 | 0.986885 | 0.719202 | 0.492861 | 0.852724 | 0.721823 | -0.540189 | -0.713140 |
| NIFTY50 | 0.982896 | 1.000000 | 0.993598 | 0.661028 | 0.442587 | 0.826104 | 0.711725 | -0.605822 | -0.669607 |
| NIFTY100 | 0.986885 | 0.993598 | 1.000000 | 0.699488 | 0.485621 | 0.838195 | 0.706442 | -0.555462 | -0.660118 |
| BITCOIN | 0.719202 | 0.661028 | 0.699488 | 1.000000 | 0.922343 | 0.894370 | 0.794760 | 0.029165 | -0.710080 |
| ETHEREUM | 0.492861 | 0.442587 | 0.485621 | 0.922343 | 1.000000 | 0.799055 | 0.617464 | 0.318997 | -0.588033 |
| S&P 500 | 0.852724 | 0.826104 | 0.838195 | 0.894370 | 0.799055 | 1.000000 | 0.783570 | -0.138667 | -0.789609 |
| GOLD | 0.721823 | 0.711725 | 0.706442 | 0.794760 | 0.617464 | 0.783570 | 1.000000 | -0.327187 | -0.732028 |
| INR/USD | -0.540189 | -0.605822 | -0.555462 | 0.029165 | 0.318997 | -0.138667 | -0.327187 | 1.000000 | 0.205373 |
| Brent Crude | -0.713140 | -0.669607 | -0.660118 | -0.710080 | -0.588033 | -0.789609 | -0.732028 | 0.205373 | 1.000000 |
# Define a custom colormap with sky blue, white, and yellow
colors = ['skyblue', 'white', 'orange']
n_bins = 100 # Discretizes the interpolation into bins
cmap_name = 'custom_cmap'
custom_cmap = LinearSegmentedColormap.from_list(cmap_name, colors, N=n_bins)
# Plot the correlation matrix heatmap with the custom colormap
plt.figure(figsize=(12, 4))
sns.heatmap(correlation_matrix, annot=True, cmap=custom_cmap, linewidths=0.5, fmt='.2f')
plt.title('Correlation Matrix Heatmap')
plt.show()
Findings from Correlation:¶
The correlation analysis reveals several key relationships between the portfolio value and various financial instruments:
Strong Positive Correlations:
- NIFTY50 and NIFTY100: The portfolio value has a very high positive correlation with both the NIFTY50 (0.985) and NIFTY100 (0.984) indices, indicating that the portfolio's performance is heavily influenced by the Indian stock market.
- S&P 500: There is a strong positive correlation (0.85) between the portfolio value and the S&P 500, suggesting that the US stock market also significantly impacts the portfolio.
Moderate Positive Correlations:
- Bitcoin: The portfolio value shows a moderate positive correlation (0.720) with Bitcoin, indicating that cryptocurrency movements can affect the portfolio.
- Ethereum: The portfolio value has a moderate correlation (0.49) with Ethereum, indicating an inverse relationship. This could imply that the portfolio has elements that are negatively impacted by Ethereum's price movements.
Negative Correlations:
- Gold: Gold has a moderate negative correlation (-0.510) with the portfolio value, suggesting that gold prices tend to move inversely to the portfolio's performance.
- Brent Crude: Brent Crude also shows a negative correlation (-0.595) with the portfolio, indicating that rising crude oil prices might adversely affect the portfolio.
- INR/USD Exchange Rate: A negative correlation (0.605) with the INR/USD exchange rate suggests that fluctuations in the strength of the Indian Rupee against the US Dollar influences the portfolio.
Relationships Among Assets:
- Cryptocurrencies: Bitcoin and Ethereum have a strong positive correlation (0.818) with each other, meaning they often move in tandem.
- Indices and Commodities: The correlations between indices (NIFTY50, NIFTY100, S&P 500) and commodities (Gold, Brent Crude) highlight complex relationships where indices generally have weaker correlations with commodities.
The portfolio's value is primarily driven by the performance of Indian and US stock markets, while cryptocurrencies and exchange rates also play significant roles. Gold and Brent Crude exhibit inverse relationships with the portfolio, which may suggest potential hedging benefits. Understanding these correlations is crucial for optimizing the portfolio's risk-return profile and making informed investment decisions.
Data Analysis¶
Research Question 1.¶
How do fluctuations in various financial indices and commodities influence individual investment portfolio values over time?¶
This question aims to explore the sensitivity of portfolio values to changes in major stock indices, which have been shown to have significant correlation coefficients.
Ordinary Least Squares Regression¶
To address this question, a regression analysis is conducted using the combined dataset that includes various independent variables such as market indices (Nifty50, Nifty100, S&P 500), cryptocurrencies (BTC, ETH), commodities (Gold, Crude Oil), and the USD/INR exchange rate. The Ordinary Least Squares (OLS) method from the statsmodels library is utilized for this analysis. This approach aims to determine how different global economic factors impact the values of my portfolio and to identify which factors contribute most to the portfolio's risk.
In this regression model, I used the actual values of the market indices (like NIFTY50, BTC, etc.) instead of their returns. This means that I can now look at how the current values of these indices directly affect the portfolio value. The straightforward impact of these indices on the portfolio without converting them into percentage changes is seen.
# Defining the dependent variable and independent variables
Y = combined_data['Portfolio Value']
X = combined_data[['NIFTY50', 'BITCOIN', 'ETHEREUM', 'S&P 500', 'GOLD', 'INR/USD', 'Brent Crude']]
# Adding a constant to the model (intercept)
X = sm.add_constant(X)
# Creating the OLS model
model = sm.OLS(Y, X).fit()
results = model.summary()
print(results)
OLS Regression Results
==============================================================================
Dep. Variable: Portfolio Value R-squared: 0.985
Model: OLS Adj. R-squared: 0.985
Method: Least Squares F-statistic: 8076.
Date: Fri, 31 May 2024 Prob (F-statistic): 0.00
Time: 16:02:32 Log-Likelihood: -6073.1
No. Observations: 852 AIC: 1.216e+04
Df Residuals: 844 BIC: 1.220e+04
Df Model: 7
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
const -7348.4811 1076.201 -6.828 0.000 -9460.826 -5236.137
NIFTY50 1.2136 0.023 53.511 0.000 1.169 1.258
BITCOIN 0.0907 0.004 25.634 0.000 0.084 0.098
ETHEREUM -1.2318 0.062 -19.945 0.000 -1.353 -1.111
S&P 500 -0.0142 0.113 -0.126 0.899 -0.235 0.207
GOLD -1.9828 0.160 -12.418 0.000 -2.296 -1.669
INR/USD 4.257e+05 6.63e+04 6.417 0.000 2.95e+05 5.56e+05
Brent Crude -132.3793 12.898 -10.264 0.000 -157.695 -107.064
==============================================================================
Omnibus: 25.278 Durbin-Watson: 0.118
Prob(Omnibus): 0.000 Jarque-Bera (JB): 31.324
Skew: 0.332 Prob(JB): 1.58e-07
Kurtosis: 3.664 Cond. No. 2.54e+08
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.54e+08. This might indicate that there are
strong multicollinearity or other numerical problems.
Interpretation of the OLS Regression Results¶
The Ordinary Least Squares (OLS) regression model examines the relationship between the portfolio value (dependent variable) and several independent variables: NIFTY50, BITCOIN, ETHEREUM, S&P 500, GOLD, INR/USD exchange rate, and Brent Crude Oil. The model aims to understand how fluctuations in these variables influence the portfolio value over time.
Key Statistics:
- R-squared: 0.985: This indicates that 98.5% of the variability in the portfolio value can be explained by the independent variables included in the model. This high R-squared value suggests a strong fit of the model to the data.
- F-statistic: 8076.0 (Prob: 0.00): This indicates that the overall model is statistically significant and the independent variables, as a group, are good predictors of the portfolio value.
Coefficients and Significance:
- Constant (Intercept): -7372: The negative value here represents the baseline level of the portfolio value when all independent variables are zero. However, this value is not particularly meaningful in this context.
- NIFTY50: 1.21: A 1 unit increase in the NIFTY50 index is associated with an increase of approximately 1.2138 units in the portfolio value. This coefficient is highly significant (p < 0.001).
- BITCOIN: 0.09: A 1 unit increase in the Bitcoin price is associated with an increase of approximately 0.0907 units in the portfolio value. This coefficient is also highly significant (p < 0.001).
- ETHEREUM: -1.23: A 1 unit increase in the Ethereum price is associated with a decrease of approximately 1.2320 units in the portfolio value, indicating an inverse relationship. This coefficient is highly significant (p < 0.001).
- S&P 500: -0.01: The S&P 500 coefficient is not significant (p = 0.899), indicating that changes in the S&P 500 index do not have a statistically significant impact on the portfolio value in this model.
- GOLD: -1.98: A 1 unit increase in the Gold price is associated with a decrease of approximately 1.9831 units in the portfolio value. This coefficient is highly significant (p < 0.001).
- INR/USD: 4.271e+05: A 1 unit increase in the INR/USD exchange rate is associated with a very large increase in the portfolio value (coefficient: 425,800). This indicates that as the USD strengthens relative to INR, the portfolio value increases significantly. This coefficient is highly significant (p < 0.001).
- Brent Crude: -132.81: A 1 unit increase in the Brent Crude Oil price is associated with a decrease of approximately 132.3982 units in the portfolio value. This coefficient is highly significant (p < 0.001).
Key Findings:
Influence of Major Stock Indices:
- The NIFTY50 index has a significant positive impact on the portfolio value, suggesting that as the NIFTY50 rises, so does the portfolio value. However, the S&P 500 index does not show a significant influence on the portfolio value in this model.
Impact of Cryptocurrencies:
- Bitcoin has a positive and significant impact on the portfolio value, whereas Ethereum has a significant negative impact. This suggests that while Bitcoin's increase can enhance the portfolio's value, Ethereum's increase might decrease it. This could imply that the investments or sectors in the portfolio are negatively correlated with Ethereum.
Effect of Commodities and Forex:
- Gold and Brent Crude Oil prices negatively impact the portfolio value. This indicates that increases in these commodity prices might lead to a decrease in the portfolio value, or as the portfolio values reduce the prices of Gold and Brent Crude go up.
- The INR/USD exchange rate has a significant positive impact on the portfolio value, indicating that the strengthening of the USD against the INR increases the portfolio's value. In the subsequent section, I will conduct a study to understand the lagging 3-day effect of the INR/USD exchange rate on the portfolio value.
Research Question2:¶
What is the impact of the INR/USD exchange rate on the value of investment portfolios in India, considering immediate and lagged effects?¶
To examine how changes in the strength of the Indian Rupee against the US Dollar affect portfolio values. This is done by performing regression analysis which assesses the relationship between INR/USD exchange rate fluctuations and portfolio returns.
The dependent variable was the portfolio returns, while the independent variables included the current and lagged INR/USD exchange rate returns (up to three days). The inclusion of lagged variables helps capture the extended influence of exchange rate changes. Significant coefficients with their respective t-values and p-values indicate the strength and direction of these effects, providing insights into the temporal impact on portfolio returns.
# Calculate daily returns for the portfolio and INR/USD exchange rate
portfolio_return = combined_data['Portfolio Value'].pct_change()
inr_usd_return = combined_data['INR/USD'].pct_change()
# Drop NA values if any from pct_change
data = pd.DataFrame({
'Portfolio Return': portfolio_return,
'INRUSD Return': inr_usd_return
}).dropna()
# Create lagged variables for INR/USD return without storing in the combined DataFrame
data['INRUSD Return Lag1'] = data['INRUSD Return'].shift(1)
data['INRUSD Return Lag2'] = data['INRUSD Return'].shift(2)
data['INRUSD Return Lag3'] = data['INRUSD Return'].shift(3)
# Drop rows with NA values resulting from lagged variables
data.dropna(inplace=True)
# Define the dependent variable and independent variables
y = data['Portfolio Return']
X = data[['INRUSD Return', 'INRUSD Return Lag1', 'INRUSD Return Lag2', 'INRUSD Return Lag3']]
# Add a constant to the model (intercept)
X = sm.add_constant(X)
# Create and fit the OLS model
model = sm.OLS(y, X).fit()
# Print the summary of the model
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Portfolio Return R-squared: 0.038
Model: OLS Adj. R-squared: 0.033
Method: Least Squares F-statistic: 8.268
Date: Fri, 31 May 2024 Prob (F-statistic): 1.53e-06
Time: 16:02:33 Log-Likelihood: 2921.6
No. Observations: 848 AIC: -5833.
Df Residuals: 843 BIC: -5809.
Df Model: 4
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 0.0006 0.000 2.186 0.029 5.98e-05 0.001
INRUSD Return 0.5647 0.116 4.848 0.000 0.336 0.793
INRUSD Return Lag1 0.2204 0.117 1.889 0.059 -0.009 0.449
INRUSD Return Lag2 -0.0492 0.117 -0.421 0.674 -0.278 0.180
INRUSD Return Lag3 0.2981 0.116 2.560 0.011 0.070 0.527
==============================================================================
Omnibus: 156.135 Durbin-Watson: 2.128
Prob(Omnibus): 0.000 Jarque-Bera (JB): 800.788
Skew: -0.734 Prob(JB): 1.29e-174
Kurtosis: 7.529 Cond. No. 461.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Interpretation of the OLS Regression Results¶
The OLS regression model examines the relationship between portfolio returns (dependent variable) and the returns of the INR/USD exchange rate, including its lagged values (independent variables). This analysis aims to understand how current and past changes in the INR/USD exchange rate influence portfolio returns.
R-squared: 0.038: This indicates that 3.8% of the variability in portfolio returns can be explained by the independent variables included in the model. While this value is relatively low, it still suggests that there is some explanatory power in the model.
Coefficients and Significance:
- Constant (Intercept): 0.0006**: This indicates a small positive baseline impact on the portfolio returns, with a t-value of 2.186 and a p-value of 0.029, making it statistically significant at the 5% level.
- INR/USD Return: 0.5647: A 1% increase in the INR/USD exchange rate is associated with a 0.5647% increase in portfolio returns. This relationship is highly significant, with a t-value of 4.848 and a p-value of 0.000.
- INR/USD Return Lag1: 0.22: The return from one day prior has a positive impact on the portfolio returns, with a t-value of 1.889 and a p-value of 0.059. This result is marginally significant, suggesting a potential lagged effect.
- INR/USD Return Lag2: -0.05: The return from two days prior shows a small negative impact on portfolio returns, but this relationship is not statistically significant, with a t-value of -0.421 and a p-value of 0.674.
- INR/USD Return Lag3: 0.30: The return from three days prior has a positive impact on portfolio returns, with a t-value of 2.560 and a p-value of 0.011, making it statistically significant.
Key Findings:
- Immediate Effect: The current day's INR/USD return has a strong positive influence on the portfolio returns. A 1% increase in the exchange rate leads to a 0.5647% increase in portfolio returns. This suggests that the portfolio is positively sensitive to immediate changes in the exchange rate.
- Lagged Effects: The first and third lagged values of the INR/USD return show significant positive impacts on portfolio returns, indicating that the effect of exchange rate changes can persist for up to three days. The second lagged value is not significant, suggesting an inconsistent impact from two days prior.
- Durbin-Watson Statistic: 2.128: This value is close to 2, suggesting that there is no strong evidence of autocorrelation in the residuals of the model.
This highlights the importance of monitoring exchange rate movements for managing and optimizing the portfolio. The significant coefficients and their corresponding t-values suggest that both immediate and certain lagged effects of the INR/USD exchange rate are crucial determinants of portfolio performance.
¶
Research Question 3:¶
How does the volatility of cryptocurrencies (Bitcoin and Ethereum) compare to traditional assets (stocks, gold, and crude oil) in a diversified investment portfolio, and what is the impact of including these volatile assets on the overall portfolio risk?¶
The objective that I am trying to achieve here is to
- Examine the volatility and return of different asset classes
- Determine the risk adjusted return using Sharpe Ratio
Risk-adjusted returns are a measure of the return on an investment relative to the amount of risk taken to achieve that return. In essence, it helps investors understand how much risk they are taking for the potential reward. Risk-adjusted returns are crucial in evaluating the performance of investments because they provide a more accurate picture of an investment's profitability when accounting for risk.
There are several methods and ratios to calculate risk-adjusted returns, with some of the most common being Sharpe Ratio. Sharpe Ratio measures the excess return per unit of risk. It helps investors understand whether the returns of a portfolio are due to smart investment decisions or a result of excessive risk. A higher Sharpe Ratio indicates better risk-adjusted returns.
The formula of calculation is Sharpe Ratio = (portfolio return - risk-free rate of return)/ standard deviation of the portfolio returns
In this project the risk free return is taken at 6% which is based on the Indian 10-year bond yield and fixed deposit rates offered by banks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the uploaded data
#file_path = '/mnt/data/combined_data.csv'
#combined_data = pd.read_csv(file_path, index_col=0, parse_dates=True)
# Calculate daily returns and cumulative returns
daily_returns = combined_data.pct_change().dropna()
cumulative_returns = (1 + daily_returns).cumprod() - 1
# Calculate metrics
mean_daily_return = (daily_returns.mean() * 100).round(2)
annualized_return = (mean_daily_return * 365).round(2)
daily_volatility = (daily_returns.std() * 100).round(2)
annualized_volatility = (daily_volatility * np.sqrt(365)).round(2)
risk_free_rate = 6.0 # 6% per annum
sharpe_ratio = ((annualized_return - risk_free_rate) / annualized_volatility).round(2)
# Create DataFrame for metrics
metrics = pd.DataFrame({
'Mean Daily Return (%)': mean_daily_return,
'Annualized Return (%)': annualized_return,
'Daily Volatility (%)': daily_volatility,
'Annualized Volatility (%)': annualized_volatility,
'Sharpe Ratio': sharpe_ratio
}).sort_values(by='Annualized Return (%)', ascending=False)
# Plot cumulative returns as a heatmap with specific labels for each quarter
plt.figure(figsize=(12, 6))
sns.heatmap(cumulative_returns.T, cmap="YlGnBu", cbar_kws={'label': 'Cumulative Return'})
# Set custom x-tick labels for each quarter
quarters = ['Jan 2022', 'Apr 2022', 'Jul 2022', 'Oct 2022', 'Jan 2023', 'Apr 2023', 'Jul 2023', 'Oct 2023', 'Jan 2024', 'Apr 2024']
plt.xticks(ticks=np.linspace(0, len(cumulative_returns.index)-1, len(quarters)), labels=quarters, rotation=45)
plt.title('Cumulative Returns of Various Asset Classes')
plt.xlabel('Date')
plt.ylabel('Asset Classes')
plt.show()
# Plot annualized return and volatility for comparison
fig, ax = plt.subplots(figsize=(14, 7))
width = 0.35
x = np.arange(len(metrics.index))
# Plotting bars
rects1 = ax.bar(x - width/2, metrics['Annualized Return (%)'], width, label='Annualized Return (%)', color='orange')
rects2 = ax.bar(x + width/2, metrics['Annualized Volatility (%)'], width, label='Annualized Volatility (%)', color='skyblue')
# Adding text for labels, title and custom x-axis tick labels
ax.set_xlabel('Assets')
ax.set_ylabel('')
ax.set_title('Annualized Return and Volatility Comparison')
ax.set_xticks(x)
ax.set_xticklabels(metrics.index)
ax.legend()
# Remove y-axis values
ax.yaxis.set_visible(False)
# Display percentage on top of the bars
for rect in rects1 + rects2:
height = rect.get_height()
ax.annotate(f'{height:.2f}%',
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points",
ha='center', va='bottom')
fig.tight_layout()
plt.show()
#Print the Dataframe
metrics
| Mean Daily Return (%) | Annualized Return (%) | Daily Volatility (%) | Annualized Volatility (%) | Sharpe Ratio | |
|---|---|---|---|---|---|
| BITCOIN | 0.07 | 25.55 | 2.90 | 55.40 | 0.35 |
| Portfolio Value | 0.05 | 18.25 | 0.79 | 15.09 | 0.81 |
| ETHEREUM | 0.04 | 14.60 | 3.61 | 68.97 | 0.12 |
| NIFTY50 | 0.03 | 10.95 | 0.71 | 13.56 | 0.37 |
| NIFTY100 | 0.03 | 10.95 | 0.71 | 13.56 | 0.37 |
| GOLD | 0.03 | 10.95 | 0.74 | 14.14 | 0.35 |
| S&P 500 | 0.01 | 3.65 | 0.97 | 18.53 | -0.13 |
| Brent Crude | 0.01 | 3.65 | 3.55 | 67.82 | -0.03 |
| INR/USD | -0.01 | -3.65 | 0.23 | 4.39 | -2.20 |
Interpretation¶
The heatmap above illustrates the cumulative returns of various asset classes from January 2022 to April 2024, providing a comprehensive visual representation of their performance over time. Here are the key takeaways from the heatmap:
The analysis of the volatility and returns of various asset classes—cryptocurrencies, equities, commodities, forex, and a mixed portfolio of Indian stocks—provides insights into their performance and risk profiles. Here's a concise summary:
Portfolio (Indian Stocks):
- Volatility: The portfolio has a moderate daily volatility (0.79%) and annualized volatility (15.09%), indicating a balanced level of risk compared to other assets.
- Returns: The portfolio shows a moderate mean daily return (0.05%) and a high annualized return (18.25%).
- Sharpe Ratio: The portfolio's Sharpe Ratio is 0.81, the highest among all assets, indicating the best risk-adjusted return.
High-Volatility Assets (Cryptocurrencies):
- Bitcoin (BTC) and Ethereum (ETH) exhibit very high daily and annualized volatility (BTC: 2.90%, 55.40%; ETH: 3.61%, 68.97%), indicating significant price fluctuations and higher risk.
- Both cryptocurrencies show high mean daily and annualized returns (BTC: 0.07%, 25.55%; ETH: 0.04%, 14.60%), reflecting their potential for substantial growth.
- Despite their high returns, the Sharpe Ratios for BTC (0.35) and ETH (0.12) are relatively low, indicating that the high returns are largely offset by the high risk.
Traditional Low-Volatility Assets:
- Equities:
- NIFTY50 and NIFTY100 have moderate volatility (NIFTY50: 0.71%, 13.56%; NIFTY100: 0.71%, 13.56%) and good risk-adjusted returns with Sharpe Ratios of 0.37 for both. The S&P 500 has a lower Sharpe Ratio (-0.13).
- Commodities:
- Gold exhibits low volatility (0.74%, 14.14%) and a relatively high Sharpe Ratio (0.35), making it a stable investment with good risk-adjusted returns.
- Brent Crude shows high volatility (3.55%, 67.82%) and a low Sharpe Ratio (-0.03), indicating significant risk with low risk-adjusted returns.
- Forex (INR/USD):
- The INR/USD exchange rate has the lowest volatility (0.23%, 4.39%) but a negative return and Sharpe Ratio (-2.20), indicating poor performance and risk-adjusted return.
- Equities:
Comparative Insights:¶
- Portfolio Value: The mixed Indian stocks portfolio has the highest Sharpe Ratio, indicating the best risk-adjusted return. It offers a balanced risk-return profile with high returns and moderate volatility.
- Cryptocurrencies offer high potential returns but come with very high risk, making them suitable for investors with a high risk tolerance.
- Equities and Gold provide a balanced risk-return profile with moderate returns and volatility, and relatively high risk-adjusted returns, making them suitable for more conservative investors.
- Brent Crude has high volatility with low risk-adjusted returns, indicating significant risk.
- INR/USD shows stability in terms of volatility but performs poorly in terms of returns and risk-adjusted metrics.
Impact on Overall Portfolio Risk:¶
- Including high-volatility assets like Bitcoin and Ethereum in a diversified investment portfolio can significantly increase the overall portfolio risk due to their substantial price fluctuations.
- However, these assets also offer high potential returns, which can enhance the portfolio's overall return, particularly if their price movements are not perfectly correlated with traditional assets.
- To mitigate the increased risk, it's essential to balance the portfolio with low-volatility assets such as equities and gold, which provide stable returns and lower risk.
Research Question 4:¶
What is the optimal asset allocation strategy for maximizing returns while minimizing risk in an Indian stock portfolio?¶
The goal of this analysis is to determine the optimal asset allocation strategy for an Indian stock portfolio that maximizes returns while minimizing risk. This is achieved through portfolio optimization, which involves calculating the optimal weights of different assets to maximize the Sharpe Ratio, a measure of risk-adjusted return. Additionally, the efficient frontier is plotted to visualize the trade-off between risk and return for different portfolios.
Optimization Process & Optimal Portfolio: The optimization process utilizes the scipy.optimize module's Sequential Least Squares Programming (SLSQP) method to maximize the Sharpe Ratio, which is the portfolio's excess return over the risk-free rate divided by its volatility. The optimization ensures that the sum of the asset weights equals 1 and that each weight is between 0 and 1, prohibiting short selling or leveraging. The resulting optimal portfolio allocates significant weights to the portfolio value (Indian stocks), gold, and Bitcoin, while other assets receive zero allocations. This allocation strategy is designed to achieve the highest risk-adjusted return, aligning with the goal of maximizing the Sharpe Ratio.
Plotting the Efficient Frontier Plot:
Efficient Frontier is a concept from modern portfolio theory that represents the set of optimal portfolios offering the highest expected return for a defined level of risk or the lowest risk for a given level of expected return. Portfolios that lie on the Efficient Frontier are considered optimal because they offer the best possible returns for the given level of risk. Efficient Frontier plot shows the relationship between risk (volatility) and return for a large number of randomly generated portfolios.
# Calculate daily returns
daily_returns = combined_data.pct_change().dropna()
# Calculate mean daily returns and annualized returns
mean_daily_returns = daily_returns.mean()
annualized_returns = mean_daily_returns * 252
# Calculate covariance matrix of daily returns
cov_matrix = daily_returns.cov() * 252
# Number of assets
num_assets = len(mean_daily_returns)
# Portfolio optimization functions
def portfolio_performance(weights, mean_returns, cov_matrix):
returns = np.dot(weights, mean_returns)
volatility = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return returns, volatility
def negative_sharpe_ratio(weights, mean_returns, cov_matrix, risk_free_rate):
p_returns, p_volatility = portfolio_performance(weights, mean_returns, cov_matrix)
return - (p_returns - risk_free_rate) / p_volatility
def optimize_portfolio(mean_returns, cov_matrix, risk_free_rate):
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
bounds = tuple((0, 1) for asset in range(num_assets))
initial_weights = num_assets * [1. / num_assets,]
result = sco.minimize(negative_sharpe_ratio, initial_weights, args=(mean_returns, cov_matrix, risk_free_rate),
method='SLSQP', bounds=bounds, constraints=constraints)
return result
# Risk-free rate (6% per annum, approximated as 0.06)
risk_free_rate = 0.06
# Optimize portfolio to find the optimal weights
optimal_result = optimize_portfolio(annualized_returns, cov_matrix, risk_free_rate)
optimal_weights = optimal_result.x
# Calculate optimal portfolio performance
optimal_return, optimal_volatility = portfolio_performance(optimal_weights, annualized_returns, cov_matrix)
optimal_sharpe_ratio = (optimal_return - risk_free_rate) / optimal_volatility
# Display optimal weights in percentages and sort in descending order
optimal_portfolio = pd.DataFrame(optimal_weights, index=mean_daily_returns.index, columns=['Weight'])
optimal_portfolio['Weight'] = (optimal_portfolio['Weight'] * 100).round(2)
optimal_portfolio = optimal_portfolio.sort_values(by='Weight', ascending=False)
print("Optimal Portfolio Weights (%):\n")
# Plot the Efficient Frontier
def plot_efficient_frontier(mean_returns, cov_matrix, risk_free_rate, num_portfolios=10000):
results = np.zeros((3, num_portfolios))
weights_record = []
for i in range(num_portfolios):
weights = np.random.random(num_assets)
weights /= np.sum(weights)
returns, volatility = portfolio_performance(weights, mean_returns, cov_matrix)
sharpe_ratio = (returns - risk_free_rate) / volatility
results[0,i] = returns
results[1,i] = volatility
results[2,i] = sharpe_ratio
weights_record.append(weights)
results_frame = pd.DataFrame(results.T, columns=['Returns', 'Volatility', 'Sharpe Ratio'])
plt.figure(figsize=(14, 7))
plt.scatter(results_frame['Volatility'], results_frame['Returns'], c=results_frame['Sharpe Ratio'], cmap='Spectral')
plt.colorbar(label='Sharpe Ratio')
plt.scatter(optimal_volatility, optimal_return, c='red', marker='*', s=200)
plt.title('Efficient Frontier')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.show()
# Plot the efficient frontier
plot_efficient_frontier(annualized_returns, cov_matrix, risk_free_rate)
Optimal Portfolio Weights (%):
optimal_portfolio
| Weight | |
|---|---|
| Portfolio Value | 70.20 |
| GOLD | 21.25 |
| BITCOIN | 8.55 |
| NIFTY50 | 0.00 |
| NIFTY100 | 0.00 |
| ETHEREUM | 0.00 |
| S&P 500 | 0.00 |
| INR/USD | 0.00 |
| Brent Crude | 0.00 |
Based on the optimization, the optimal weights for the portfolio are as follows¶
- Portfolio (Indian Stocks): The largest portion of the portfolio (70%) is allocated to the exisitng portfolio.
- BITCOIN: A significant portion (9%) is allocated to Bitcoin, indicating a belief in its potential for high returns despite its volatility.
- GOLD: Another large allocation (21%) is towards gold, which is often seen as a safe-haven asset.
- Other Assets: The weights for other assets like NIFTY50, NIFTY100, Ethereum, S&P 500, INR/USD, and Brent Crude are effectively zero, indicating that these assets do not contribute to the optimal risk-return trade-off for the portfolio I hold currently.
Efficient Frontier Interpretation¶
The Efficient Frontier plot visualizes the risk-return trade-off for various portfolio combinations, where each colored scatter point represents a portfolio with different asset weights. The color of these points indicates the Sharpe Ratio, with warmer colors representing higher Sharpe Ratios. The red star marks the optimal portfolio, which has the highest Sharpe Ratio, thus maximizing returns for a given level of risk. This optimal portfolio effectively balances risk and return, making it a recommended choice for investors aiming to maximize their risk-adjusted returns.
Key insights from the plot highlight the risk-return trade-off and the benefits of diversification. The efficient frontier shows that moving right along the frontier increases both risk and expected return, allowing investors to select a point based on their risk tolerance and return objectives. Portfolios below the efficient frontier are sub-optimal, offering lower returns for the same risk compared to those on the frontier. The plot also demonstrates that a well-diversified portfolio can achieve better risk-return profiles than individual assets, emphasizing the advantages of diversification in portfolio management.
Key Takeaways¶
Diversification Strategy: Implement a diversified investment strategy that prioritizes Indian stocks and includes allocations in gold and Bitcoin to achieve the best risk-adjusted returns.
Monitoring Exchange Rates: Regularly monitor the INR/USD exchange rate and consider both current and lagged effects to adjust the portfolio accordingly, as currency fluctuations significantly impact portfolio value.
Risk Management: Incorporate traditional low-volatility assets such as equities and gold to balance the high potential returns and significant risks associated with cryptocurrencies like Bitcoin and Ethereum.
Continual Reassessment: Periodically reassess the portfolio allocation using the efficient frontier plot to ensure the portfolio remains optimized for the best risk-return trade-off, considering changing market conditions and asset performance.
Conclusion¶
This comprehensive analysis of my personal investment portfolio underscores the critical role of diversification and strategic asset allocation in optimizing returns while managing risk. The study revealed that a well-diversified portfolio, predominantly composed of Indian stocks, with significant allocations to gold and Bitcoin, offers the best risk-adjusted returns. The regression analysis highlighted the significant impact of various factors such as market indices, cryptocurrencies, commodities, and exchange rates on portfolio performance. Notably, the INR/USD exchange rate emerged as a crucial determinant of portfolio value, indicating the need for investors to stay vigilant and responsive to currency fluctuations.
By employing an efficient frontier plot, I identified the optimal portfolio configuration, characterized by the highest Sharpe Ratio. This optimal mix includes 70% in Indian stocks, 9% in Bitcoin, and 21% in gold, effectively balancing risk and return. The insights gained from this study emphasize the importance of continuous portfolio reassessment and adaptation to changing market conditions. Additionally, the findings suggest that incorporating a range of asset classes, including traditional low-volatility assets and high-potential cryptocurrencies, can enhance portfolio performance. Going forward, future research could delve deeper into the impact of other macroeconomic factors and explore the potential of alternative investments, providing a more comprehensive understanding of optimal asset allocation strategies.
Exporting the combined_data to CSV for Assignment 3¶
# First, make sure the index is named 'Date'
combined_data.index.name = 'Date'
# Convert the index to datetime format if it's not already
combined_data.index = pd.to_datetime(combined_data.index)
# Format the date in DD/MM/YY format
combined_data.index = combined_data.index.strftime('%d/%m/%y')
# Save the DataFrame to a CSV file
combined_data.to_csv('/Users/rohan/Downloads/combined_data.csv', index=True)
Out of Core processing - Develop a Trend Indicator for NIFTY 50¶
The NIFTY 50 index dataset used here comprises minute-level data from 2015 onwards. The dataset is substantial, with over 800,000 rows, making it impractical to load entirely into memory for processing. To handle this large volume of data, out-of-core processing techniques were employed. The data was processed in manageable chunks of 10,000 rows at a time. Each chunk was read, cleaned, and aggregated independently before combining the results. This approach allowed to work with large datasets without exceeding memory limits.
Initialization and Reading Data in Chunks¶
In this step, I defined the file path and initialize an empty DataFrame to store the results. Set a chunk size to manage memory constraints and create an iterator to read the CSV file in chunks. This is the core of out-of-core processing as it allows us to handle large datasets that cannot fit into memory at once.
# Define the file path
file_path = '/Users/rohan/Downloads/NIFTY 50 - Minute data.csv'
# Initialize an empty DataFrame to store results
result_df = pd.DataFrame()
# Define the chunk size
chunk_size = 10000
# Create an iterator to read the file in chunks
chunk_iter = pd.read_csv(file_path, chunksize=chunk_size, parse_dates=['date'])
Processing Each Chunk¶
In this step, initialized a list to store hourly aggregated data. Then iterate over each chunk, dropping rows with missing values and grouping the data by hour to calculate the mean close price. This approach optimizes memory usage and processing time by handling the data in manageable chunks rather than loading the entire dataset into memory.
# Initialize lists to store hourly aggregated data
hourly_agg_list = []
# Process each chunk
for chunk in chunk_iter:
# Drop rows with missing values
chunk = chunk.dropna()
# Group by hour and calculate mean close price
chunk['hour'] = chunk['date'].dt.floor('H')
hourly_agg = chunk.groupby('hour').agg(mean_close=('close', 'mean')).reset_index()
hourly_agg_list.append(hourly_agg)
Combining and Calculating Moving Averages¶
Combine the hourly aggregated data from all chunks into a single DataFrame. Then calculated the 52-week and 12-hour exponential moving averages (EMAs). The 52-week EMA considers approximately 1690 trading hours, while the 12-hour EMA provides a shorter-term trend indicator. These calculations are done on the combined dataset, ensuring that we have the full context of the data.
# Combine the hourly aggregated data
hourly_agg_df = pd.concat(hourly_agg_list, ignore_index=True)
# Calculate the 52-week moving average
# Note: There are approximately 52*5*6.5=1690 trading hours in 52 weeks (considering 5 trading days per week and 6.5 trading hours per day)
hourly_agg_df['EMA_52WK'] = hourly_agg_df['mean_close'].ewm(span=1690, adjust=False).mean()
# Calculate the 12-hour moving average
hourly_agg_df['EMA_12HR'] = hourly_agg_df['mean_close'].ewm(span=12, adjust=False).mean()
Generating Buy/Sell Signals and Plotting¶
Generated buy and sell signals based on the crossover of the two EMAs. A buy signal is generated when the 12-hour EMA crosses above the 52-week EMA, and a sell signal is generated when the 12-hour EMA crosses below the 52-week EMA. We store these signals in the Signal column and calculate the Position column to identify changes in signals. The plot visually demonstrates the crossover strategy, highlighting potential buy and sell opportunities. This visualization helps in understanding the effectiveness of the trading strategy and provides insights into market trends.
# Generate buy/sell signals
hourly_agg_df['Signal'] = 0
hourly_agg_df['Signal'] = (hourly_agg_df['EMA_12HR'] > hourly_agg_df['EMA_52WK']).astype(int)
hourly_agg_df['Position'] = hourly_agg_df['Signal'].diff()
# Plot the results
plt.figure(figsize=(14, 8))
plt.plot(hourly_agg_df['hour'], hourly_agg_df['EMA_52WK'], label='52-Week EMA', color='orange', linewidth=2)
plt.plot(hourly_agg_df['hour'], hourly_agg_df['EMA_12HR'], label='12-Hour EMA', color='skyblue', linewidth=2)
# Plot buy signals
buy_signals = hourly_agg_df[hourly_agg_df['Position'] == 1]
sell_signals = hourly_agg_df[hourly_agg_df['Position'] == -1]
plt.plot(buy_signals['hour'], buy_signals['mean_close'], '^', markersize=10, color='green', lw=0, label='Buy Signal')
plt.plot(sell_signals['hour'], sell_signals['mean_close'], 'v', markersize=10, color='red', lw=0, label='Sell Signal')
plt.title('NIFTY 50 EMA Crossover Strategy')
plt.xlabel('Hour')
plt.ylabel('Price')
plt.legend()
plt.grid()
plt.show()
# Print the last row to see the latest signal
print(hourly_agg_df.tail())
hour mean_close EMA_52WK EMA_12HR Signal Position 15705 2024-01-25 11:00:00 21299.396667 19531.246861 21365.047838 1 0.0 15706 2024-01-25 12:00:00 21287.076667 19533.323537 21353.052273 1 0.0 15707 2024-01-25 13:00:00 21278.865000 19535.388045 21341.638847 1 0.0 15708 2024-01-25 14:00:00 21275.080000 19537.445635 21331.399024 1 0.0 15709 2024-01-25 15:00:00 21348.885000 19539.588082 21334.089174 1 0.0
Interpretation:¶
- Uptrend: When the 12 HR EMA of the NIFTY 50 index is near or above the 52-week high, it indicates a strong uptrend. It shows that the index has been performing well over the past year and is reaching new highs, which can be a signal of a bullish market.
- Resistance Level: The 52-week high often acts as a resistance level. If the index approaches this level but fails to break through, it may indicate strong resistance and potential for a price pullback.
- Breakout: If the index breaks above the 52-week high, it is considered a bullish signal, suggesting that the market sentiment is strong and the price may continue to rise.
- Downtrend or Reversal: If the index is consistently below its 52-week high, it may indicate a downtrend or a bearish market sentiment. The further the price is from the 52-week high, the weaker the market sentiment migh
Currently the broader Indian stock market is in a bull run.
Demonstration of Out-of-Core Processing:¶
In this demonstration, I used chunk processing to read and aggregate NIFTY 50 index minute data for 9 years.
Query Performance Issues and Optimizations:¶
- Chunk Size: The chunk size is set based on memory constraints to balance memory usage and processing speed. Adjusting the chunk size can optimize performe2cy.
- Dropping Missing Values: Dropping rows with missing values in each chunk ensures cleaner data and reduces errors in calculati3ns.
- Aggregations: Performing hourly aggregations in chunks and then combining the results minimizes the processing load at any single point, leading to more efficient computat3on.
- Vectorized Operations: Using vecto operations like ewm for calculating ting the 52-week high ensures faster execution compared to iterative methods.
Vectorized Operations used in the Code:
- Calculation of Exponential Moving Averages (EMAs): The ewm meththod and the subsequent .mean() method are vectorized operation . These methods operate on the entirmean_close column at once, rather than iterating through each value individually.
- Generation of Buy/Sell Signals:
- The comparison (hourly_agg_df[EMA_12HR] > hourly_agg_df[EMA_52WK]) is a vectorized operation. It compares each element in the EMA_12HR column with the corresponding element in the EMA_52WK column simultaneously.
- The .astype(int` method converts the boolean results of the comparison into integers (0 or 1) in a vectorized manner.
- The .diff() method calculates the difference between consecutive elements in the Signal column in a vectorized way.
These vectorized operations significantly enhance the performance and efficiency of the code by leveraging pandas optimized functions to handle operations on entire columns of data simultaneously. neously. be.
References¶
Filis, G., Degiannakis, S., & Floros, C. (2011). Dynamic correlation between stock market and oil prices: The case of oil-importing and oil-exporting countries. International Review of Financial Analysis, 20(3), 152-164. https://doi.org/10.1016/j.irfa.2011.02.014
Baur, D. G., & Lucey, B. M. (2010). Is gold a hedge or a safe haven? An analysis of stocks, bonds and gold. Financial Review, 45(2), 217-229. https://doi.org/10.1111/j.1540-6288.2010.00244.x
Jain, S., & Biswal, P. C. (2016). Dynamic linkages among oil price, gold price, exchange rate, and stock market in India. Resources Policy, 49, 179-185. https://doi.org/10.1016/j.resourpol.2016.05.008
Campbell, J. Y., & Thompson, S. B. (2008). Predicting excess stock returns out of sample: Can anything beat the historical average? The Review of Financial Studies, 21(4), 1509-1531. https://doi.org/10.1093/rfs/hhm055
Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271-1291. https://doi.org/10.1080/14697688.2019.1571683
Corbet, S., Lucey, B., & Yarovaya, L. (2018). Datestamping the Bitcoin and Ethereum bubbles. Finance Research Letters, 26, 81-88. https://doi.org/10.1016/j.frl.2017.12.006
DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? The Review of Financial Studies, 22(5), 1915-1953. https://doi.org/10.1093/rfs/hhm075
Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77-91. https://doi.org/10.1111/j.1540-6261.1952.tb01525.x
Jorion, P. (2007). Value at Risk: The new benchmark for managing financial risk. McGraw-Hill Education.
Pan, M. S., Fok, R. C., & Liu, Y. A. (2007). Dynamic linkages between exchange rates and stock prices: Evidence from East Asian markets. International Review of Economics & Finance, 16(4), 503-520. https://doi.org/10.1016/j.iref.2005.09.003
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83, 187-205. https://doi.org/10.1016/j.eswa.2017.04.030
Cao, L., & Parry, M. (2009). Portfolio management using reinforcement learning with transaction costs. Journal of Financial Markets, 12(4), 450-480. https://doi.org/10.1016/j.finmar.2009.04.001
#Sunday 10PM